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Abstract 


External accountability policies have spread fast across various educational systems over the past de- 
cades. This research examines the relations of internal and external accountability with students’ math 
achievement drawing on PISA 2012. With a sample of 44 educational systems, of which external account- 
ability policies were identified, the research conducted three-level hierarchical linear modelling (HLM) 
analyses. This research found that some internal accountability factors had tighter relations with math 
achievement, while the relations of external accountability policies with student performance were rather 
tenuous. However, equity of student math achievement was better ensured under strong accountability 
systems. The results suggest that policy makers of each country should consider strengths and weaknesses 
of external accountability in their own educational contexts. 

Keywords: external accountability, educational equity, internal accountability, math achievement, 
PISA. 


Introduction 


The past two decades have seen educational accountability combined with student as- 
sessment spread fast across educational systems worldwide. Although there exist substantial 
cross-national variations in specific policy measures, educational policy makers have increas- 
ingly adopted standards-, and performance-based educational reforms accompanied by national 
assessment, which was motivated by growing international educational testing and the global 
new public management trend (Kamens & McNeely, 2010; Morris, 2011). England and the U.S. 
were forerunner countries in performance-based accountability policies, and it did not take long 
for many countries in distant world regions, for example, South Korea, Australia, Norway and 
Sweden to follow suit (Elstad, 2009; Lingard, 2010; Lundahl & Waldow, 2009; Sung & Kang, 
2012). Scholars suggest that this transnational policy diffusion was accelerated by OECD PISA 
among others (Meyer, Tréhler, Labaree, & Hutt, 2014; Ozga, 2013). 

The global policy convergence toward performance-based accountability in education 
raises effects of external accountability policies on student achievement as a significant re- 
search issue, which inevitably conveys important policy implications. With its growing stature 
in educational policy reforms across nations, pervasive external accountability movement in 
education becomes a global educational policy issue that needs rigorous research evidence. 
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Given the significance of policy concerns about this topic, a growing number of empirical stud- 
ies have accumulated, but study results remain mixed and inconclusive. 

Previous research, conducted predominantly in the U.S. and the U.K. context, has not 
reached a consensus about the effectiveness of external accountability, which features state- 
wide tests, public reporting of test results, and rewards or sanctions based on the test results. 
Some scholars found significantly positive effects of external accountability (Carnoy & Loeb, 
2002; Dee & Jacob, 2011; Hanushek & Raymond, 2005), while a meta-analysis on the effect of 
test-driven external accountability reported a modestly positive effect on average achievement 
(Lee, 2008). Yet others found no significant effect of external accountability policies, particu- 
larly with little equity enhancement (Lee & Reeves, 2012; Lee & Wong, 2004). Some cross- 
country studies suggested that student test scores were significantly higher in nations with ex- 
ternal exit exams (Bishop, 1998; Schiitz, Liidemann, Woessmann, & West, 2010; Woessmann, 
2005) while another study by Lee & Amo (2017) reported no significant growth of student 
achievement in nations with high school exit exam policies compared to other nations without 
high school exit exam policies. 

On the other hand, many scholars contended that external accountability alone would not 
be linked to improved, and long-term student learning outcomes, and that internal accountability 
and organizational capacity should precede external accountability (Elmore, 2004; Newmann, 
King, & Rigdon, 1997; O’Day, 2002). However, these critics of externally driven educational 
accountability policy tended to put forth their arguments based on qualitative case-study results. 

This research aims to reveal multilevel relations between external accountability of edu- 
cational system, internal accountability within schools, and students’ math achievement, which 
was the main subject of PISA 2012. Accordingly, many background questions were asked about 
math. 


Literature Review 
Concepts of External vs. Internal Accountability 


School accountability, in which school performance is evaluated using student perfor- 
mance measures, is increasingly prevalent around the world (Figlio & Loeb, 2011). Although 
there are various conceptions of educational accountability, such as political, legal, bureaucrat- 
ic, professional, moral, and market accountability (Adams & Kirst, 1999), performance-based, 
or test-based accountability has predominantly driven educational reform policies in many edu- 
cational systems for the past decades. Under the external accountability system, schools are 
externally or outwardly accountable for student academic performance, which is sometimes 
published for public information, and based on which schools are rewarded or penalized. Gen- 
erally, school accountability systems include three elements: state-wide student tests, public 
reporting of school performance, and rewards or sanctions based on some measures of school 
performance or improvement (Kane & Staiger, 2002). Examples of external accountability 
policies include No Child Left Behind (NCLB) Act in the U.S. and National Assessment Pro- 
gram — Literacy and Numeracy (NAPLAN) in Australia. 

In contrast to external accountability, many scholars have emphasized internal, school- 
oriented accountability, in which system professionally negotiated standards for school perfor- 
mance development (Adams & Kirst, 1999; Newmann, King, & Rigdon, 1997). While an exter- 
nal accountability system imposes standards from outside and mostly lacks a capacity building 
component, an internal accountability model establishes performance standards through pro- 
fessional negotiation and knows site needs for capacity building. Most importantly, alignment 
between site and system standards is possible and impact of teaching is direct with internal 
accountability models. 

Elmore (2004) conceptualized internal accountability as holding teachers accountable 
for student learning in line with personal responsibility and shared expectations, combined with 
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certain consequences. In other words, teachers and schools are accountable for student learning, 
of which standards are shared in the school community. When the shared expectations are not 
met or internalized by teachers, there should be internal consequences. Elmore and Fuhrman 
(2001) argued that internal accountability should precede external accountability in school re- 
forms for genuine school improvement. 


Relations of External Accountability with Student Achievement 


In theory, the effectiveness of external accountability in education are premised on a 
principal-agent model, in which the parent is a principal who commissions school teachers 
as agents to educate the child on his or her behalf (W68mann, Liidemann, Schiitz, & West, 
2007). Due to the asymmetric information problem concerning the efforts of teachers, external 
accountability measures such as high-stakes testing, public reporting of school results, and 
rewards or penalty based on the results are supposed to incentivize teachers to make efforts. 

In practice, empirical investigation on the effects of external accountability on student 
achievement has been carried out in the U.S. domestic and international context, although there 
are some other country cases. The U.S. cross-state analyses used the National Assessment of 
Student Progress (NAEP) results to identify the effects of the NCLB on student achievement 
(Carnoy & Loeb, 2002; Dee & Jacob, 2011; Hanushek & Raymond, 2005; Lee & Reeves, 2012; 
Lee & Wong, 2004). 

The US-based empirical studies revealed that evidence was mixed: some studies reported 
significantly positive effects of external accountability, particularly on math achievement (Car- 
noy & Loeb, 2002; Dee & Jacob, 2011; Hanushek & Raymond, 2005), while others suggested 
that the NCLB did not generate sustainable and generalizable policy effects, particularly with 
little equity enhancement (Lee & Reeves, 2012; Lee & Wong, 2004). 

Hanushek and Raymond (2005) argued for consequential accountability policies based 
on their findings that reporting results alone had minimal impact on student performance. By 
contrast, Burgess, Wilson, & Worth (2013) found that abolition of school performance tables 
negatively affected school effectiveness in Wales, the U.K. This finding supported by public 
reporting only could impact school efforts. 

In a meta-analysis, Lee (2008) concluded that the high-stakes accountability policy 
showed a modestly positive effect on average achievement. Figlio and Ladd (2008), in their 
review of literature, suggested that school accountability policy seemed more effective in math 
than in reading with a modest effect size in general, and it hardly reduced the achievement gap, 
particularly between White and Black students. 

Recently, Lee and Amo (2017) found that neither student accountability policy (high 
school exit exam with consequences for students) nor school accountability policy (high-stakes 
testing with consequences for teachers and schools) affected average achievement growth in 
grade 8 math of NAEP. In addition, they found either student-targeted or school-targeted ac- 
countability policy did not close the achievement gaps in the U.S. 

International cross-country studies using PISA data have provided fairly positive evi- 
dence on the relations of accountability measures with student achievement. Hanushek and 
Woessmann (2010) suggested that accountability was an important institutional feature that 
contributed to higher student performance. For example, students in systems with central exit 
examinations were more likely to perform better (Jtirges, Richter, & Schneider, 2005; Woess- 
mann, 2005) and public posting of school performance was positively related with student 
achievement (Boarini & Ludemann, 2009). Accountability measures aimed at teachers and 
schools were positively associated with student achievement (Schtitz, Lidemann, Woessmann, 
& West, 2010). Particularly, external accountability was effective when combined with autono- 
my (Hanushek, Link, & Woessmann, 2013). In addition, the relations of various accountability 
measures with student achievement did not significantly differ for students with different SES 
(Schuetz, Luedemann, West, & Woessmann, 2013). 
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Recently, Lee and Amo (2017) indicated that school accountability policy was found to 
be an ineffective policy from an international comparative perspective. That is, initially low- 
performing countries were more likely to adopt stronger school accountability policies, which 
later did not contribute to making more academic progress or closing the achievement gaps 
within countries. Although the study has a methodological limitation of a small sample size, 
which is linked to a lack in statistical power to detect policy effects, it is worthwhile to note their 
consistent findings of ineffectiveness of external accountability policies. 


Relations of Internal Accountability with Student Achievement 


Critics of external accountability policies suspected that externally imposed account- 
ability might not ensure school improvement, which actually requires internal organizational 
capacity (Elmore & Fuhrman, 2001; Newmann et al., 1997; O’Day, 2002; Vanhoof & Petegem, 
2007). They recognized the importance of internal accountability for bona fide school improve- 
ment. Furthermore, some scholars proposed professional accountability based on trust as an 
alternative to external accountability to realize school reforms (Meller, 2009; O’Neill, 2013; 
Sahlberg, 2008). However, those who oppose to external accountability emphasizing internal 
accountability tended to base their proposition on a conceptual ground rather than on empirical 
evidence. 

An empirical study that combines external and internal accountability was conducted 
by Lee and his colleagues (2014), who attempted to disentangle the relations between external 
standards, internal standards and student achievement using longitudinal data of the U.S. In 
this study, internal standards are translated as teacher expectation considering students’ prior 
achievement and background. They found that the linkage between state standards and student 
achievement was tenuous, whereas the linkage between teacher standards and student achieve- 
ment was solid and reciprocal. 

Elmore (2004), based on school case studies, concluded that schools with strong internal 
accountability functioned more effectively under external accountability pressures. He writes, 
“Strong internal accountability is a condition that precedes and determines a school’s response 
to external accountability (Elmore, 2004, p. 134).” There are few empirical analyses on the rela- 
tions of internal accountability as a comprehensive concept with student achievement. Rather, 
researchers explored how each component of internal accountability, such as principal leader- 
ship, and teacher morale is related to student performance. For example, Hallinger and Heck 
(1998) reviewed literature on principal leadership and student achievement and reported that 
school principals have indirect, yet statistically significant influence on student achievement. 
Leithwood and Day (2008) claimed that principal leadership affects student outcomes indirectly 
and most powerfully through staff motivation. The lack of empirical studies comparing the re- 
lations of external versus internal accountability with student achievement using international 
data justifies the significance of this study. 


Research Questions 


This research examines the effectiveness of external accountability policies in terms of 
the level of math achievement and equity of achievement. It first hypothesizes that internal 
accountability is strongly associated with student math achievement, while the relations be- 
tween external accountability and student math achievement are weak. With respect to equity 
of achievement, previous research provided inconsistent findings. Thus, this research explores 
how external accountability policies at the system level are related with the effects of SES on 
math achievement. Specifically, we address the following research questions in this research: 

1. Whether and how much external accountability policies at the system level and internal 
accountability components at the school level are associated with students’ math achievement? 

2. Is a strong external accountability system effective in narrowing math achievement 
gaps of students from different SES groups? 
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Methodology of Research 
Data and Sample 


This research is a quantitative secondary analysis study, using the PISA 2012 data to ex- 
amine whether and how much external and internal accountability is related to student achieve- 
ment across different educational systems. The OECD has conducted large scale tests including 
mathematical, reading, and scientific literacy for international students among 15-year-olds ev- 
ery three years since 2000. The sampling design used for the PISA assessment was a stratified 
sample design in all countries (OECD, 2014a). The sampling units consisted of schools hav- 
ing 15-year-old students from a comprehensive national list of all PISA-eligible schools in 65 
participating countries in PISA 2012. Compared to the previous cycles, PISA 2012 provides 
richer information on school accountability. For example, measures of quality assurance, prin- 
cipal leadership behaviors, and consequences of teacher evaluation are included in the school 
questionnaire. Hence, it provides an opportunity to examine multilevel relations between ex- 
ternal accountability at the system level, internal accountability at the school level, and student 
achievement at the individual level. 

The sample selection process was as follows. Among 65 education systems that original- 
ly participated in PISA 2012, 46 systems were first identified with national-level accountability 
policies through extensive literature reviews. Then, the acquired information was confirmed 
with national project managers of PISA 2012 of each country via email in January and Febru- 
ary 2017. With a slight revision of information based on the email correspondences, 44 systems 
were finally selected for analyses of this research. Two systems, the U.K. and Shanghai-China, 
were excluded in the final analysis due to a severe missing data problem. The sample restric- 
tions resulted in final sample sizes of 314,327 students and 12,183 schools from 44 educational 
systems. For comparisons of high- and low-external accountability systems as of 2012, 6 coun- 
tries that take all of three external accountability measures including Australia, Chile, Hungary, 
Korea, Mexico, and the USA and 8 countries with no external accountability policies in their 
education system such as Switzerland, Spain, Finland, Greece, Croatia, Liechtenstein, Lithu- 
ania, and Macao-China were selected. 


Variables 


First, the dependent variable of this study is individual students’ math achievement in 
PISA 2012, using 5 plausible values. 

Second, external accountability measures 1) whether national testing exists (NATION- 
TE), 2) whether school performance are reported publicly, for example, on the website (RE- 
PORTIN), and 3) whether there are sanctions and rewards based on student performance (SAN- 
REW). In reference to national testing, implementation of either standards-based tests or exit 
exams was considered. That is, national testing was coded 1 if either standards-based tests or 
central exit exam was in place and implemented in the educational system as of 2012. External 
accountability measures may include strong pressures such as threat of reconstitution, principal 
transfer and loss of students (Carnoy & Loeb, 2002), but this research confined external ac- 
countability measures to only three simplified variables according to Kane and Staiger (2002). 

The school questionnaire of PISA 2012 included a few questions indicating the system 
level accountability such as whether student assessments are used to compare the school to 
district or national performance, whether achievement data are posted publicly. However, it 
is uncertain that the state-wide tests and public reporting of school performance were practi- 
cally institutionalized in an educational system since there existed substantial variation across 
schools that answered the questionnaire. Moreover, the PISA background questionnaire did not 
provide information about whether there are sanctions and rewards based on test results. Thus, 
extensive reviews of documents from OECD, UNESCO, World Bank, EU, national govern- 
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ments, and official websites in addition to scholarly articles were conducted to identify the 
status of external accountability systems. 

Third, we constructed internal accountability measures based on Elmore (2004)’s con- 
cept of internal accountability, which consists of teachers’ individual responsibility aligned 
with collective expectations and internal consequences when misalignment is detected. First, 
individual responsibility from the perspective of teachers can be equivalent to teacher morale 
factor in PISA (TCMORALE). The questionnaire asks the respondent (principal) how much 
they agree with the following questions: 1) the morale of teachers is high; 2) teachers work with 
enthusiasm; 3) teachers take pride in this school; and 4) teachers value academic achievement. 
Second, collective expectations can be measured by parental expectation and their participa- 
tion in school activities, teacher monitoring, and principal leadership. The PISA 2012 includes 
a question about parental expectations towards the school by asking how much the school 
receives parental pressures about high academic standards (PAREXPEC). Also, the propor- 
tions of parents who participate in a variety of school-related activities are asked (PARPART). 
For teacher monitoring, a question about methods of monitoring the practice of math teachers 
was used (TMONITOR). Regarding principal leadership, we used four factor variables pro- 
vided by the OECD. The school questionnaire for PISA 2012 contained 21 items about school 
leadership activities and 4 factor variables are presented: LEADCOM, LEADINST, LEADPD, 
and LEADTCH (OECD, 2014a, pp.345-346). LEADCOM represents 4 items for framing and 
communicating the school’s goals and curricular development; LEADINST does 3 items for 
instructional leadership; LEADPD does 3 items for promoting instructional improvements and 
professional development; and LEADTCH does 3 items for teacher participation in leadership. 
Third, for the measure of internal consequences (CONSEQ), we used the question concerning 
teacher appraisals. The question reads, “To what extent have appraisals of and/or feedback to 
teachers directly led to a change in salary, a financial bonus or another kind of monetary reward, 
opportunities for professional development activities, a change in the likelihood of career ad- 
vancement, public recognition, changes in work responsibilities that make the job more attrac- 
tive, and a role in school development initiatives?” 

Finally, in the multilevel model specification, we included a few key control variables 
that could influence student achievement at the student, school, and system level. On the basis 
of prior empirical studies, we included gender, SES, and math self-efficacy as control variables 
at the student level. School control (private or public) and school average SES were controlled 
for at the school level. Finally, at the system level, per capita GDP (LNGDP) was controlled for. 


Data Analysis 


A three-level hierarchical linear modeling (HLM) was used to estimate the relations be- 
tween external and internal accountability and students’ achievement, controlling for variables 
affecting student achievement. This multilevel analytical method was chosen because the data 
have a hierarchical structure with individual students nested within schools within educational 
systems (Raudenbush & Bryk, 2002). The level 1, level 2, and level 3 models are as follows: 


Unconditional Model 
Level 1 model (student level): 
Lot. Tes e.~ N(0, 0°). 
i yk ai ijk | es : : ; 
where is the math achievement of student i in school j and country k; Toy, 1S the intercept 
for school j in country k; and ¢,,, is a level-1 random effect that represents the deviation of stu- 


dent ijk’s score from the predicted score based on the student-level model. 


Level 2 model (school level): 

Mo Boot Voix 2 : : : 

where 3,,, is the mean achievement in school k, and Ty, 18 a level-2 random effect that 
represents the deviation of school mean. 
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Level 3 model (country level): 


Boo. = Yooot Yoo . 
where y,,,. is the grand mean, and u,,, is a level-3 random effect 


Conditional Model 

Level 1 model (student level): 

Yar Tox i ae Ty (Din + Sine Sige NO, 0°). 

where Vin is the math achievement of student 1 in school j and country k; 7,,, is the in- 
tercept for school j in country k; a,,, are p = 1,..., P student characteristics that predict math 
achievement; are the corresponding level-1 coefficients that indicate the direction and strength 
of association between each student characteristics, a., and the outcome in school jk; and ei, is 
a level-1 random effect that represents the deviation of student ijk’s score from the predicted 
score based on the student-level model. 


Level 2 model (school level): 

Moi Boat ae Boar (7) - oj . : . 

where B,,, is the intercept for country k in modeling the school effect Toi,3 Bout is the cor- 
responding coefficient that represents the direction and strength of association between school 
characteristic Tos and ly is a level-2 random effect that represents the deviation of school 
mean. 


Level 3 model (country level): 
= S 
Boor ~ Yooo ie sai Yoos (Bow) . Yoox 2 . 
where Y,,, 1s the intercept term in the country-level model for B,,,,; Yoo, is the correspond- 
ing level-3 coefficient that represents the direction and strength of association between country 
characteristic B,,,; and u,,, is a level-3 random effect that represents the deviation of grand 
mean. 


Results of Research 
Descriptive Statistics 
Table | shows the descriptive statistics for the full sample of data from 314,327 students 
and 12,183 schools from 44 educational systems. In addition, the descriptive statistics for the 
sample of low-accountability system, which includes 8 countries with 2,131 schools and 62,689 


students, and for the sample of high-accountability system, which contains 6 countries with 
2,846 schools and 66,909 students are presented. 
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Table 1. Descriptive statistics. 







































































Full Sample ee aca a 
Variable Mean sD Mean sD Mean sD 
Level 4 (n=314,327) (n=62,689) (n=66,909) 
Math 475.6 101.2 498.8 92.5 455.3 94.8 
Male 0.50 0.50 0.50 0.50 0.50 0.50 
SES -0.31 1.13 -0.10 0.98 -0.49 1.23 
Self-efficacy -.04 97 .06 .96 -.10 .94 
Level 2 (n=12,183) (n=2,131) (n=2,846) 
Public 0.81 0.39 0.80 0.40 0.74 0.44 
SES_mean -0.38 0.84 -0.08 0.54 -0.58 1.00 
TCMORALE -0.09 1.00 -0.10 0.97 0.00 1.00 
TMONITOR 2.09 0.94 1.48 1.00 2.39 0.79 
PAREXPEC 1.82 0.71 1.53 0.65 1.95 0.72 
PARPART 11.80 12.5 10.08 8.64 14.34 16.05 
LEADCOM 0.11 1.03 -0.41 0.97 0.34 1.00 
LEADINST 0.01 1.01 -0.36 1.00 0.08 1.03 
LEADPD 0.08 1.01 -0.12 0.95 0.10 1.02 
LEADTCH 0.08 1.02 -0.14 0.94 0.13 1.10 
CONSEQ 2.02 0.69 1.72 0.66 2.11 0.65 
Level 3 (n=44) (n=8) (n=6) 
NATIONTE 0.82 0.39 0.00 0.00 1.00 0.00 
REPORTIN 0.32 0.47 0.00 0.00 1.00 0.00 
SANREW 0.14 0.35 0.00 0.00 1.00 0.00 
LNGDP 10.18 0.58 10.58 0.88 10.06 0.78 





At the student level, the average math score is 475.6 for the full sample, whereas it is 
498.8 for the low-accountability system sample and 455.3 for the high-accountability system 
sample. It shows that students’ math achievement in the system with strong external account- 
ability measures is lower than that of the low-accountability systems on average. The propor- 
tion of male students is about half across all samples. The SES index is -0.31 for the full sample 
while it is -0.1 for the low-accountability system and -0.49 for the high-accountability system. 
Students’ math self-efficacy in the low-accountability system is 0.06 while it is —0.10 in the 
high-accountability system. 

At the school level, 81 percent of schools are public for the full sample and 80 percent 
and 74 percent for the low- and high-accountability system respectively. The school mean SES 
is the lowest for the high-accountability system samples and the highest for the low-account- 
ability system samples. We included 9 internal accountability variables. Teacher morale is not 
so much different across three sample groups. For teacher monitoring, the high-accountability 
system countries execute more teacher monitoring (2.39) than the low-accountability system 
countries (1.48). Parental expectation towards schools concerning academic standards is higher 
in the high-accountability system (1.95) than in the low-accountability system (1.53). The per- 
centage of parents’ participation in school activities was also higher in the high-accountability 
system (14.34%) than in the low-accountability system (10.08). The high-accountability system 
shows a higher value in all school leadership factor variables and in the degree of consequences 
of teacher appraisals. 
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At the country level, 82% of 44 sample countries (36 systems) implement national test- 
ing, and 32% (15 systems) report school performance publicly, and 14% (6 systems) link the test 
results to rewards and/or sanctions toward schools. Logged GDP of the full sample countries 
is 10.18 while it is 10.58 and 10.06 for the low- and high-accountability system respectively. 


Multilevel analyses of accountability and math achievement 


To account for a nested data structure and disentangle the relations between system- 
level external accountability versus school-level internal accountability and math achievement, 
3-level HLM analyses were conducted. The HLM analysis results for the full sample, low- 
accountability system sample, and high-accountability system sample altogether in Table 2 are 
presented for comparison. 


Table 2. Three-level HLM analysis results. 



























































Full sample Low-accountability system  High-accountability system 
Unconditional Conditional Unconditional Conditional Unconditional Conditional 
model model model model model model 
Fixed effects 
infekeant 480.8* 475.8* 496.1* 473.6* 462.9* 

P (7.05) (5.01) (9.20) (16.95) (14.49) 
Level-1 (Student) 

Male 6.0* 2.6* 8.2* 
1.72 0.76 0.65 

9.4* 15.2* 6.1* 
aa (1.48) (0.49) (0.40) 

: 32.7* 38.0 30.9% 
Self-efficacy (1.93) (0.40) (0.37) 
Level-2 (School) 

Public 10.0* 5.80* 5.20* 
3.67 2.30 1.91 
43.1* 37.20* 33.6* 
SES nea (5.05) (1.78) (1.12) 
3.3* 1.70* 4.70* 
ee (0.61) (0.85) (0.74) 
-0.55 2.15* -2.33* 
TMONITOR (0.69) (.93) (0.93) 
0.07 -0.10 0.01 
Enel (0.06) (0.09) (0.05) 
2.80* -2.30* 4.80* 
FARENPEC (1.00) (1.12) (1.02) 
0.30 -1.50 1.60 
HEADCOM (0.85) (1.02) (1.06) 
-0.23 0.30 -0.90 
ene! 0.65 1.07 1.17 
-4.60* -3.10* -4.80* 
bec (0.71) (0.92) (0.92) 
1.40* 0.6 0.60 
Faaaieod (0.46) (1.04) (0.95) 
-2.26* -0.2 -1.50 
NEES (0.68) (1.30) (1.17) 
Level-3 (Country) 
8.90 18.0 10.30 
asad (5.70) (13.78) (20.26) 
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-31.1* 
REPORTIN (11.70) 

17.0 
SANREW (16.53) 
Variance component 
L1 variance 4,960 3,945 6,104 5,604 4,585 3,694 
L2 variance 3,058* 1,080* 2,093* 1,057* 2,775" 937* 
L3 variance 2,159" 1,065* 650* 985* 1,939* 1,247* 





Notes. Standard errors for regression coefficients are in parenthesis. The analyses with five plausible values 
are undertaken according to the OECD’s guideline (OECD, 2014a, p.147) with the HLM 7 software program. 


“p<.10 * p< .05 


First, in a fully unconditional model for the full sample, there is only one fixed effect 
of average school mean which was 480.8 and significantly different from zero. In terms of the 
variance partitioning, the largest percentage (49%) lies between students within schools (i.e., at 
level 1); a substantial, though smaller, percentage (30%) lies between schools within countries 
(i.e., at level 2); another portion (21%) lies between countries (i.e., at level 3). The variations 
between schools and between countries are statistically significant, which justifies the three- 
level HLM modelling. 

In the conditional model for the full sample, only one external accountability policy was 
statistically significant. That is, public reporting of school performance is negatively associated 
with student math scores, while national testing and sanctions and/or rewards based on the 
test results are not related with math achievement. By contrast, many internal accountability 
measures have statistically significant relations with student math scores. Teacher morale is 
positively associated with math achievement. Parent’s expectation toward schools concerning 
academic standards is positively related with math achievement. School leadership for pro- 
moting instructional improvements and professional development is negatively related with 
math achievement, while teacher participation in leadership is positively associated with math 
achievement. Finally, internal consequences of teacher evaluation are negatively related with 
math achievement. All control variables at the student and school level, i.e., student gender, 
SES, math self-efficacy, public schools, and school mean SES, have statistically significant 
relations with math achievement. Only GDP at the country level is not a statistically significant 
variable. Individual and school mean SES seem to have explained the variances in student 
achievement. 

Second, the results of three-level HLM analyses with a sample of 8 countries with no 
external accountability policies (low-accountability system), and with a sample of 6 countries 
with strong accountability policies (high-accountability system) are presented in the 4" and 6" 
column in Table 2. Among the internal accountability variables, teacher morale is a significant 
positive predictor of student math achievement consistently across the low- and high-account- 
ability systems. Teacher monitoring is positively related with math achievement in the low- 
accountability systems, whereas it has a negative relation in the high-accountability systems. 
Parental expectation toward schools regarding academic standards has a positive association 
with math achievement in the high-accountability system countries. Just as indicated in the 
full sample, school leadership for promoting instructional improvements and professional de- 
velopment is negatively related with math achievement in both low- and high-accountability 
systems. On the other hand, teacher participation in leadership and the degree of consequences 
of teacher evaluation are not significant predictors of math achievement in the selected low- and 
high-accountability systems. 

The second research question of this study concerns the coefficients of SES. The coef- 
ficients of students’ SES are 15.2 in the low-accountability systems and 6.1 in the high-account- 
ability systems, which are both statistically significant. In other words, the influence of indi- 
vidual SES on student math achievement is much stronger in the low-accountability systems 
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than in the high-accountability systems. Also, the influence of schools’ mean SES on student 
achievement is statistically significant in both low- and high-accountability systems, and the 
coefficients are 37.2 and 33.6, respectively. That is, the influence of schools’ mean SES on stu- 
dent math achievement is also stronger in the low-accountability systems. 


Discussion 


This research examined the relations of system level external accountability and school 
level internal accountability measures with math achievement drawing on 44 educational sys- 
tems’ PISA 2012 data. 

The HLM analysis results indicate that external accountability policies except for pub- 
lic reporting are not significant predictors of student achievement. Public reporting of school 
performance in nation-wide tests has a negative relationship with math achievement, which is 
in contrast to the previous finding of Boarini and Ludemann (2009) with the PISA 2006 data of 
OECD countries. Intuitively, it is hard to explain that public reporting policy has a negative in- 
fluence on student achievement. Rather, low-performing systems are supposed to adopt public 
reporting of school performance to stimulate school efforts. The inference is based on the fact 
that high-accountability systems achieve lower scores and their economic status is lower than 
low-accountability systems. All in all, external accountability has rather tenuous relations with 
student achievement. 

In contrast to external accountability, internal accountability measures have tighter rela- 
tions with student achievement. Among others, teacher morale has a significant positive asso- 
ciation with student achievement, consistently across all sample groups. This finding confirms 
the importance of teachers’ internal motivation and passion in education (Yi, 2015). Regarding 
teacher monitoring, the HLM analyses with low- and high-accountability systems present con- 
tradictory findings while teacher monitoring has no significant relations with math scores in the 
full sample. A previous study reported that students in countries with more monitoring of teach- 
er lessons by principals performed better (Woessmann et al., 2007). In the low-accountability 
systems, higher teacher monitoring is associated with higher student achievement, whereas 
higher teacher monitoring is linked to lower student achievement in the high-accountability 
system. In other words, teacher monitoring exercises in the low-accountability systems func- 
tion well to contribute to student learning. It seems that countries with no external account- 
ability put more importance on school level teacher monitoring that provides direct feedbacks 
on teachers. This research considered four factor variables of school leadership, among which 
two variables were found to have a statistically significant relation with math achievement. 
Oddly enough, school leadership for promoting instructional improvements and professional 
development (LEADPD) had a consistently negative relation with student achievement across 
all sample groups. However, when the question items are examined closely, higher values of 
this variable are likely associated with more student problems and disruptive behaviours, which 
should be negatively related to student performance. It is notable that teacher participation in 
school leadership (LEADTCH) has a positive relation with student achievement of the full 
sample. Recently, distributed leadership has received attention as a determining lever for school 
reforms (Spillane, 2006). Specifically, an empirical study found significant direct effects of 
distributed leadership on change in the schools’ academic capacity and indirect effects on stu- 
dent growth rates in math (Heck & Hallinger, 2009). Thus, the current study results support 
the previous study findings. Finally, internal consequences of teacher evaluation have a nega- 
tive relation with student achievement. It seems that rewarding teachers extrinsically based on 
teacher appraisals may not be linked to teacher morale, which has a direct positive effect on 
student performance, and may do harm to educational practices that promote student learning. 

The findings of the HLM analyses with sub-samples revealed that equity of student 
achievement was better ensured under strong external accountability. For example, Korea, 
which belongs to a high-accountability system in the sample, implemented the National As- 
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sessment of Educational Achievement (NAEA), public reporting of school performance on 
the website, and financial incentives linked to school performance for the purpose of equity 
enhancement (Lee, 2017). The proportion of students who do not reach a basic proficiency level 
in Korean, English and math at the school level was publicly posted on a website in Korea as of 
2012. On the contrary, Koretz (2017) criticizes high-stakes test-based accountability, discuss- 
ing negative effects such as inappropriate test preparation, score inflation, corruption of ideal 
of teaching, and widespread cheating. According to him, the most substantial positive effect of 
test-based accountability has improved math performance, although it does not persist long. In 
spite of accumulated evidence about the overall negative effects of test-based accountability, 
predominantly found in the U.S., the rationale of many other countries’ adopting test-based ac- 
countability could be ensuring educational equity, of which evidence is found in this research. 

There are still several limitations in the current research. First, the research used cross- 
sectional data to examine the multilevel relations between system level external accountability, 
schools’ internal accountability, and student math achievement. Therefore, the results do not 
represent causal relations between explanatory variables and math achievement and are open to 
reversed explanations. Some countries with a strong accountability system may have adopted 
accountability policy measures to address the low level of student achievement. As it is impos- 
sible to determine the causal impact of external accountability on student achievement with 
cross-sectional data, use of longitudinal data is a research direction forward. 

Second, internal accountability measures are independently included in the HLM analy- 
ses. Originally, the concept of internal accountability represents a coherent construct, which 
is based on a level of agreement among teachers on the norms, values, and expectations that 
shape their work (Elmore, 2004, p.134). When the degree of internal alignment of teacher re- 
sponsibility, collective expectations of stakeholders, i.e., teachers, parents, and principals, and 
accountability is high, the school supposedly has strong internal accountability. Although this 
study attempted to materialize the concept of internal accountability, it did not measure one 
united concept of internal accountability that reflects the original idea. 

Third, we separated the full sample into low- and high-accountability systems and con- 
ducted multilevel analyses separately to examine how differently internal accountability mea- 
sures operate and whether equity of achievement varies. To check the robustness of results, it is 
desirable to apply a multilevel multigroup analysis in a subsequent study. 

Finally, this research conducted a cross-country analysis without considering each coun- 
try’s contextual differences. A case study with a few representative countries will contribute to 
better understanding of accountability in education. 


Conclusions 


The findings of this research suggest that top-down external accountability may not be 
as much effective as expected, while school-based internal accountability factors are more con- 
ducive to student achievement. In particular, it is of much importance to encourage teachers to 
sustain high teacher morale and participate in school decision-making. When teachers are in- 
trinsically motivated, they commit themselves to education, which eventually benefit students’ 
learning and growth. As the findings show, both teacher morale and distributed leadership have 
significantly positive relations with student performance. Principals need to demonstrate dis- 
tributed leadership in school so that teachers can have self-determined ownership in school 
decision-making, which may in turn increase teacher morale. 

Another important finding is that strong external accountability may be able to con- 
tribute to educational equity. The effects of both individual and school SES on student per- 
formance, indicated as coefficients, were lower in the high-accountability system than in the 
low-accountability system. In other words, individual and school SES are weakly related with 
student achievement in the high-accountability educational system. This finding suggests that 
external accountability measures might be beneficially utilized to narrow achievement gaps 
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between low- and high-SES student groups. Therefore, policy makers of each country should 
consider strengths and weaknesses of external accountability in their own educational contexts. 
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